Botany 2022

00-Setup

Install

This workshop was made using R 4.2.1, we suggest you update your R.

Prior to starting this workshop, you should run the following lines of code to install and update the packages needed for these demos.

## Make a list of packages
list_of_packages <- c("dplyr", 
                      "tidyr",
                      "plyr", 
                      "spocc", 
                      "ridigbio",
                      "tibble", 
                      "tidyverse",
                      "rbison",
                      "CoordinateCleaner",
                      "lubridate",
                      "ggplot2",
                      "gtools",
                      "raster", 
                      "sp", 
                      "spatstat", 
                      "spThin", 
                      "fields", 
                      "ggspatial", 
                      "rgdal", 
                      "rangeBuilder", 
                      "sf", 
                      "dismo", 
                      "devtools", 
                      "ENMeval", 
                      "caret", 
                      "usdm", 
                      "stringr", 
                      "factoextra", 
                      "FactoMineR", 
                      "multcompView", 
                      "ggsci",
                      "gridExtra", 
                      "ecospat", 
                      "rJava", 
                      "viridis", 
                      "ENMTools", 
                      "ape", 
                      "RStoolbox", 
                      "hypervolume", 
                      "phytools",
                      "picante", 
                      "leaflet")


## Here we identify which packages are not installed and install these for you
new.packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

## Check version of packages
### Compare to package version used for demo creation
### This data frame has two columns: (1) package (2) version, indicating which
### version of the package was used to develop this workshop
versiondf <- read.csv("data/setup/setup.csv", stringsAsFactors = FALSE)
### Save your version under a new column named "current_version"
versiondf$current_version <- as.character(do.call(c, lapply(list_of_packages, packageVersion)))
### Compare the version the workshop was developed with to your current version
updatelist <- versiondf[which(versiondf$version != versiondf$current_version), ]
### Update packages with old versions
lapply(as.character(updatelist$packages), install.packages, ask = FALSE)

## Make sure all packages load
## Packages that are FALSE did not load
### If anything prints here, then something went wrong and a package did not install
loaded <- lapply(list_of_packages, require, character.only = TRUE)
list_of_packages[loaded == FALSE]
# Install packages not in CRAN
library(devtools)
install_github('johnbaums/rmaxent')
install_github("marlonecobos/kuenm")

## Check and make sure all github packages load
github_packages <- c("rmaxent", "kuenm")
github_loaded <- lapply(github_packages, require, character.only = TRUE)
github_packages[github_loaded == FALSE]

What are these packages?

tidyverse is a collection of R packages that are used for “everyday data analysis” including ggplot2, dplyr, tidyr, readr, purr, tibble, stringr, forcats. We use dplyr and ggplot2 throughout many of the R scripts.

ridigbio is an R package that queries the iDigBio API.

spocc queries data from GBIF, USGS’ Biodiversity Information Serving Our Nation (BISON), iNaturalist, Berkeley Ecoinformatics Engine, eBird, iDigBio, VertNet, Ocean Biogeographic Information System (OBIS), and Atlas of Living Australia (ALA).

RCurl and rjson are used to query an application programming interface (API), or an online database.

Geospatial packages include sp, raster, maptools, maps, rgdal, RStoolbox, and mapproj.

ggplot2 is used to plot and visualize data. ggbiplot is an add on to ggplot2.

ENMTools, ENMeval, and dismo are specific to ecological niche modeling.

hypervolume and caret have functions related to statistics.

phytools and picante are related to phylogenetics.

factoextra and FactoMiner are used for statistics related to a principal components analysis.



Troubleshooting

  • For spatstat make sure xcode is installed on OS.
system("xcode-select --install")
  • spThin and fields do not compile from source!

  • If you are still having issues installing ggbiplot, you may have to uninstall the package ‘backports’ and reset R.



This exercise requires the package ENMTools. This package requires Java (and the Oracle Java JDK) and that the maxent.jar file is in the folder that contains the dismo package (this package is a dependency of ENMTools). To find out where to put the jar file, run the command:

system.file("java", package="dismo")

Once you have your java and maxent file set, you are ready to go! If you have more issues, find some help here

To download the jar file to the right directory —

devtools::install_github("johnbaums/rmaxent")
rmaxent::get_maxent(version = "latest", quiet = FALSE)

The previous step may mask rgdal! Make sure to reload it!

For rJava, the newest package requires R >= 3.6.0
dismo only requires rJava >= 0.9-7, download 0.9.7 here

If you are experiencing errors with rJava:

## Remove rJava
remove.packages(rJava)
## Update your R Java configuration 
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)

If you are still having issues with rJava - check to see if you are using Java 12 (this should be included in the error message):

system("java -version")

The initial release of java 12 does not work - install an old version instead. To install an old version, navigate to the Oracle website or use homebrew.

system("brew cask install homebrew/cask-versions/java11")

Restart R and redo the previous steps. Required: uninstall the newer versions of java prior to repeating and restart R after.

## Uninstall Java 12, then repeat the following
## Remove rJava
remove.packages(rJava)
## Update your R Java configuration 
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)

More debugging: Check out jerrybreak’s comment from 12/30/2021.

If you have additional issues, please let us know!

01-Download Occurrence Data

ML Gaynor.

Load packages

library(tidyverse)
library(spocc) 
library(ridigbio) 
library(rbison)
library(leaflet)

Load functions

This is a function I created with Natalie Patten. It will be part of her R package gatoRs (Geographic And Taxonomic Occurrence R-based Scrubbing).

source("functions/gators.R")

Data download from iDigBio

Search for the species Galax urceolata.

iDigBio_GU <- idig_search_records(rq=list(scientificname="Galax urceolata"))

Search for the family Diapensiaceae.

iDigBio_GU_family <- idig_search_records(rq=list(family="Diapensiaceae"), limit=1000)

What if you want to read in all the points for a family within an extent?

Hint: Use the iDigBio portal to determine the bounding box for your region of interest.

The bounding box delimits the geographic extent.

rq_input <- list("scientificname"=list("type"="exists"),
                 "family"="Diapensiaceae", 
                 geopoint=list(
                   type="geo_bounding_box",
                   top_left=list(lon = -98.16, lat = 48.92),
                   bottom_right=list(lon = -64.02, lat = 23.06)
                   )
                 )

Search using the input you just made

iDigBio_GU_family_USA <- idig_search_records(rq_input, limit=1000)

Save as csv files

write.csv(iDigBio_GU, "data/download/iDigBio_GU_20220712.csv", 
          row.names = FALSE)
write.csv(iDigBio_GU_family, "data/download/iDigBio_GU_family_20220712.csv", 
          row.names = FALSE)

Data download using spocc_combined

Make synonym lists

Shortia_galacifolia <- c("Shortia galacifolia", "Sherwoodia galacifolia")
Galax_urceolata <- c("Galax urceolata", "Galax aphylla")
Pyxidanthera_barbulata <- c("Pyxidanthera barbulata","Pyxidanthera barbulata var. barbulata")
Pyxidanthera_brevifolia <- c("Pyxidanthera brevifolia", "Pyxidanthera barbulata var. brevifolia")

Use the spocc_combine function

This function downloads records for all names listed from iDigBio, GBIF, and BISON. It keeps specific columns from each database.

spocc_combine(Shortia_galacifolia,
               "data/download/raw/Shortia_galacifolia_raw_20220712.csv")
spocc_combine(Galax_urceolata, 
              "data/download/raw/Galax_urceolata_raw_20220712.csv")
spocc_combine(Pyxidanthera_barbulata, 
              "data/download/raw/Pyxidanthera_barbulata_raw_20220712.csv")
spocc_combine(Pyxidanthera_brevifolia,
              "data/download/raw/Pyxidanthera_brevifolia_raw_20220712.csv")

Quick-look at the downloaded files

Read in downloaded data frame

rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20220712.csv")

Inspect the data frame

What columns are included?

names(rawdf)
##  [1] "name"                          "basis"                        
##  [3] "date"                          "institutionID"                
##  [5] "collectionCode"                "collectionID"                 
##  [7] "country"                       "county"                       
##  [9] "state"                         "locality"                     
## [11] "Latitude"                      "Longitude"                    
## [13] "ID"                            "coordinateUncertaintyInMeters"
## [15] "informationWithheld"           "habitat"                      
## [17] "prov"                          "spocc.latitude"               
## [19] "spocc.longitude"               "spocc.prov"                   
## [21] "spocc.date"                    "spocc.name"

Here is a table of the columns returned for each species in spocc_combine:

Column Description
name scientific name, http://rs.tdwg.org/dwc/terms/scientificName
basis basis of record, http://rs.tdwg.org/dwc/terms/basisOfRecord
date event data, http://rs.tdwg.org/dwc/terms/eventDate
institutionID institution ID, http://rs.tdwg.org/dwc/terms/institutionID
collectionCode collection code, http://rs.tdwg.org/dwc/terms/collectionCode
collectionID collection ID, http://rs.tdwg.org/dwc/terms/collectionID
country country, http://rs.tdwg.org/dwc/terms/country
county county, http://rs.tdwg.org/dwc/terms/county
state stateprovince, http://rs.tdwg.org/dwc/terms/stateProvince
locality http://rs.tdwg.org/dwc/terms/locality or http://rs.tdwg.org/dwc/terms/verbatimLocality
Latitude http://rs.tdwg.org/dwc/terms/decimalLatitude
Longitude http://rs.tdwg.org/dwc/terms/decimalLongitude
ID idigbio = uuid, gbif = key, bison = occurrenceID
coordinateUncertaintyInMeters http://rs.tdwg.org/dwc/terms/coordinateUncertaintyInMeters
habitat http://rs.tdwg.org/dwc/iri/habitat
prov indicates who provided the data: gbif, bison, or idigbio
spocc_* these columns are returned from the spocc::occ2df function

How many observations do we start with?

nrow(rawdf)
## [1] 815

Where are these points?

The error message here indicates many points do not have long/lat values (more in 02).

leaflet(rawdf) %>% 
  addMarkers(label = paste0(rawdf$Longitude, ", ", rawdf$Latitude)) %>% 
  addTiles()

02-Occurrence Data Cleaning

ML Gaynor.

Load Packages

library(dplyr)
library(tidyr)
library(raster)
library(sp)
library(sf)
library(spatstat)
library(spThin)
library(fields)
library(lubridate)
library(CoordinateCleaner)
library(ggplot2)
library(ggspatial)
library(leaflet)

Load functions

This is a function I created with Natalie Patten. It will be part of her R package gatoRs (Geographic And Taxonomic Occurrence R-based Scrubbing).

source("functions/gators.R")

Read in downloaded data frame

rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20220712.csv")

Cleaning

Inspect the data frame

How many observations do we start with?

nrow(rawdf)
## [1] 815

1. Resolve taxon names

Inspect scientific names included in the raw df.

unique(rawdf$name)
## [1] "shortia galacifolia"                          
## [2] "sherwoodia galacifolia"                       
## [3] "Shortia galacifolia Torr. & A.Gray"           
## [4] "Sherwoodia galacifolia (Torr. & A.Gray) House"

Create a list of accepted names based on the name column in your data frame

search <-  c("Shortia galacifolia", "Sherwoodia galacifolia")

Filter to only include accepted name:

df <-  filter_fix_names(rawdf, listofsynonyms = search, acceptedname = "Shortia galacifolia")
## [1] "List of synonyms:"
## [1] "Shortia galacifolia"    "Sherwoodia galacifolia"
## [1] "Shortia galacifolia|Sherwoodia galacifolia"
## [1] "Looking at unique names in raw df"
## [1] "shortia galacifolia"                          
## [2] "sherwoodia galacifolia"                       
## [3] "Shortia galacifolia Torr. & A.Gray"           
## [4] "Sherwoodia galacifolia (Torr. & A.Gray) House"
## [1] "Looking at unique names in cleaned df"
## [1] "shortia galacifolia"                          
## [2] "sherwoodia galacifolia"                       
## [3] "Shortia galacifolia Torr. & A.Gray"           
## [4] "Sherwoodia galacifolia (Torr. & A.Gray) House"

How many observations do we have now?

nrow(df)
## [1] 815

2. Decrease number of columns

Merge the two locality columns

df$Latitude <- dplyr::coalesce(df$Latitude, df$spocc.latitude)
df$Longitude <- dplyr::coalesce(df$Longitude, df$spocc.longitude)

If spocc didn’t download any lat/longs and there are 0 values in the spocc.latitude or spocc.longitude columns, skip this step
The two columns could have different classes, if so, find the fix here

Merge the two date columns

df$date <- dplyr::coalesce(df$date, df$spocc.date)

Subset the columns

df <- df %>%
      dplyr::select(ID = ID, 
                    name = new_name, 
                    basis = basis, 
                    coordinateUncertaintyInMeters = coordinateUncertaintyInMeters, 
                    informationWithheld = informationWithheld, 
                    lat = Latitude, 
                    long = Longitude, 
                    date = date)

3. Clean localities

Filtering out NA’s

df <- df %>%
      filter(!is.na(long)) %>%
      filter(!is.na(lat))

How many observations do we have now?

nrow(df)
## [1] 259

Precision

Round to two decimal places

df$lat <- round(df$lat, digits = 2)
df$long <- round(df$long, digits = 2)

Remove unlikely points

Remove points at 0.00, 0.00

df <- df %>%
      filter(long != 0.00) %>%
      filter(lat != 0.00)

Remove coordinates in cultivated zones, botanical gardens, and outside our desired range

df <- CoordinateCleaner::cc_inst(df, 
              lon = "long", 
              lat = "lat", 
              species = "name")
## Testing biodiversity institutions
## Removed 0 records.

Next, we look for geographic outliers and remove outliers.

df <- CoordinateCleaner::cc_outl(df, 
              lon = "long", 
              lat = "lat", 
              species = "name")
## Testing geographic outliers
## Removed 44 records.

How many observations do we have now?

nrow(df)
## [1] 215

4. Remove Duplicates

Fix dates

Parse dates into the same format

df$date <- lubridate::ymd(df$date)

Separate date into year, month, day

df <- df %>%
      mutate(year = lubridate::year(date), 
             month = lubridate::month(date), 
             day = lubridate::day(date))

Remove rows with identical lat, long, year, month, and day

df <- distinct(df, lat, long, year, month, day, .keep_all = TRUE)

How many observations do we have now?

nrow(df)
## [1] 211

5. Spatial Correction

Maxent will only retain one point per pixel. To make the ecological niche analysis comparable, we will retain only one point per pixel.

One point per pixel

Read in raster file
bio1 <- raster("data/climate_processing/bioclim/bio_1.tif")
Set resolution
rasterResolution <- max(res(bio1))
Filter

Remove a point which nearest neighbor distance is smaller than the resolution size (aka remove one point in a pair that occurs within one pixel)

while(min(nndist(df[,6:7])) < rasterResolution){
  nnD <- nndist(df[,6:7])
  df <- df[-(which(min(nnD) == nnD) [1]), ]
}

How many observations do we have now?

nrow(df)
## [1] 205

Spatial thinning

Reduce the effects of sampling bias using randomization approach.

Calculate minimum nearest neighbor distance in km
nnDm <- rdist.earth(as.matrix(data.frame(lon = df$long, lat = df$lat)), miles = FALSE, R = NULL)
nnDmin <- do.call(rbind, lapply(1:5, function(i) sort(nnDm[,i])[2]))
min(nnDmin)
## [1] 0.9109077
Identify points to keep using spThin
keep <- spThin::thin(loc.data =  df, 
        verbose = FALSE, 
        long.col = "long", 
        lat.col = "lat",
        spec.col = "name",
        thin.par = 0.002, # Studies found 2m distance was enough to collect unique genets
        reps = 1, 
        locs.thinned.list.return = TRUE, 
        write.files = FALSE)[[1]]
Filter df to only include those lat/long that have been kept using spThin
df <- df %>%
       filter((lat %in% keep$Latitude +
                long %in% keep$Longitude) == 2)
nrow(df)
## [1] 205

6. Plot Cleaned Records

Make points spatial

df_fixed <- st_as_sf(df, coords = c("long", "lat"), crs = 4326)

Set basemap

USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)

Plot

Here we are using ggplot2 and ggspatial to plot our occurrence records. We are using two basemaps, USA and state. The geom_point function adds our points based on long and lat, and colors them blue. We set the coordinate visualization space with coord_sf. We fixed the x and y labels. Finally, we added a scale and north arrow.

simple_map <- ggplot() +
              USA +
              state +
              geom_sf(df_fixed, 
                       mapping = aes(col = name), 
                       col = "blue") +
              coord_sf(xlim = c(min(df$long) - 3, max(df$long) + 3),
                       ylim = c(min(df$lat) - 3, max(df$lat) + 3)) +
              xlab("Longitude") +
              ylab("Latitude") +
              annotation_scale() +
              annotation_north_arrow(height = unit(1, "cm"), 
                                     width = unit(1, "cm"), 
                                     location = "tl")
simple_map

Extra

Another fun way to view these points is with leaflet.

leaflet(df_fixed) %>% 
  addMarkers(label = paste0(df$long, ", ", df$lat)) %>% 
  addTiles()

7. Save Cleaned.csv

write.csv(df, "data/cleaning_demo/Shortia_galacifolia_20220712-cleaned.csv", row.names = FALSE)

8. Make maxent ready

Read in all cleaned files

alldf <- list.files("data/cleaning_demo/", full.names = TRUE, 
                    recursive = FALSE, include.dirs = FALSE, pattern = "*.csv")
alldf <- lapply(alldf, read.csv)
alldf <- do.call(rbind, alldf)

Plot all records

Visually inspect records to verify no additional records should be removed.

## Plot all records
### Make points spatial 
alldf_fixed <- st_as_sf(alldf, coords = c("long", "lat"), crs = 4326)

### Set basemap
USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)

### Plot 
all_map <- ggplot() +
            USA +
            state +
            geom_sf(alldf_fixed, 
                    mapping = aes(col = factor(name))) +
            coord_sf(xlim = c(min(alldf$long) - 3, max(alldf$long) + 3),
                     ylim = c(min(alldf$lat) - 3, max(alldf$lat) + 3)) +
            xlab("Longitude") +
            ylab("Latitude") +
            annotation_scale() +
            annotation_north_arrow(height = unit(1, "cm"), 
                                   width = unit(1, "cm"), 
                                   location = "tl")
all_map

Select needed columns

alldf <- alldf %>%
         dplyr::select(name, lat, long)

Save Maxent.csv

write.csv(alldf, "data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv", row.names = FALSE)

03-Georeferencing

Create a batch file for GeoLocate. Created by ME Mabry.

Load Packages

library(dplyr) 
library(tidyr)
library(plyr) 
library(tibble) 

Read in downloaded data frame

rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20220712.csv")

Identify records for geoferencing

Merge the two locality columns

rawdf$Latitude <- dplyr::coalesce(rawdf$Latitude, rawdf$spocc.latitude)
rawdf$Longitude <- dplyr::coalesce(rawdf$Longitude, rawdf$spocc.longitude)

Subset for columns of interest

rawdf <- rawdf %>%
         dplyr::select(ID = ID, 
                      name = name, 
                      basis = basis, 
                      coordinateUncertaintyInMeters = coordinateUncertaintyInMeters, 
                      informationWithheld = informationWithheld,
                      country = country,
                      locality = locality,
                      lat = Latitude, 
                      long = Longitude, 
                      date = date,
                      state = state,
                      county = county)

Filter for NA locality

rawdf_GeoRef <- filter(rawdf, is.na(lat))

Format for GeoLocate

Subset needed columns

rawdf_GeoRef <- rawdf_GeoRef %>%
                dplyr::select("locality string" = locality,
                              country = country,
                              state = state,
                              county = county,
                              latitude = lat, 
                              longitude = long,
                              ID = ID, 
                              name = name, 
                              basis = basis)

Add required columns

rawdf_GeoRef$'correction status' <- ""
rawdf_GeoRef$precision <- ""
rawdf_GeoRef$'error polygon' <- ""
rawdf_GeoRef$'multiple results' <- ""

Reorder

rawdf_GeoRef2<- rawdf_GeoRef[,c("locality string", "country", "state", "county", "latitude",
                                "longitude", "correction status", "precision", "error polygon",
                                "multiple results", "ID", "name", "basis")]

Save file for georeferencing

write.csv(rawdf_GeoRef2,
          file = "data/georeferencing/Shortia_galacifolia_Needing_GeoRef_20220712.csv",
          row.names = FALSE)

04-Climate Processing

Climate layer processing.
Modified and created by ML Gaynor.
Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).

Load Packages

library(raster)
library(gtools)
library(dplyr)
library(rgdal)
library(sp)
library(rangeBuilder)
library(sf)
library(caret)
library(usdm)
library(dismo)
library(stringr)

Load functions

Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).

source("functions/VIFLayerSelect.R")

Load bioclim layers

biolist <- list.files("data/climate_processing/bioclim/", pattern = "*.tif", full.names = TRUE)

Order list using gtools.

biolist <- mixedsort(sort(biolist))

Load rasters as a stack.

biostack <- raster::stack(biolist)

Load occurrence records

And fix the name column class and make sure it is a character.

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")
alldf$name <- as.character(alldf$name)

Present Layers - all

Here we will make the projection layers, or the layers which include shared space. First, we have to define the accessible space.

Make into a spatial point data frame

alldfsp <- alldf
pts <- cbind(alldf$long, alldf$lat)
alldfsp <- SpatialPointsDataFrame(pts, data = alldf, proj4string=CRS("EPSG:4326"))

Create alpha hull

hull <- getDynamicAlphaHull(x = alldfsp@coords, 
                             fraction = 1, # min. fraction of records we want included
                             partCount = 1, # number of polygons
                             initialAlpha = 20, # initial alpha size, 20m
                             clipToCoast = "terrestrial",
                             proj = "+proj=longlat +datum=WGS84")
## Warning in proj4string(hull): CRS object has comment, which is lost in output; in tests, see
## https://cran.r-project.org/web/packages/sp/vignettes/CRS_warnings.html
## Warning in sp::spTransform(gshhs, CRS(proj4string(hull))): NULL source CRS
## comment, falling back to PROJ string
## Warning in wkt(obj): CRS object has no comment

Visualize

plot(hull[[1]], col=transparentColor('gray50', 0.5), border = NA)
points(x = alldf$long, y = alldf$lat, cex = 0.5, pch = 3)

Add buffer to hull

Calculate buffer size

Here we take the 80th quantile of the max distance between points

buffDist <- quantile(x = (apply(spDists(alldfspTrans), 2,
                                FUN = function(x) sort(x)[2])), 
                     probs = 0.80, na.rm = TRUE) 
buffDist
##      80% 
## 5862.504
Buffer the hull
buffer_m <- buffer(x = hullTrans, width = buffDist, dissolve = TRUE)
buffer <- spTransform(buffer_m,  CRS("EPSG:4326"))
Visualize
plot(buffer, col=transparentColor('gray50', 0.5), border = NA)
points(x = alldf$long, y = alldf$lat, cex = 0.5, pch = 3)

Mask and crop bioclim layers

path <- "data/climate_processing/all/"
end <- ".asc"
for(i in 1:length(biolist)){
    # Subset raster layer
    rast <- biostack[[i]]
    # Setup file names
    name <- names(rast)
    out <- paste0(path, name)
    outfile <- paste0(out, end)
    # Crop and mask
    c <- crop(rast, extent(buffer))
    c <- mask(c, buffer)
    # Write raster
    writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
}

Select layers for MaxEnt

Warning: Many of these steps are time intensive! Do not run during the workshop, we have already done these for you.

We only want to include layers that are not highly correlated. To assess which layers we will include, we can look at the pearson correlation coefficient among layers.

Stack all layers

clippedlist <- list.files("data/climate_processing/all/", pattern = "*.asc", full.names = TRUE)
clippedlist <- mixedsort(sort(clippedlist))
clippedstack <- raster::stack(clippedlist)

Then calculate the correlation coefficient

Warning: Time Intensive!

corr <- layerStats(clippedstack, 'pearson', na.rm=TRUE)

Isolate only the pearson correlation coefficient and take absolute value.

c <- abs(corr$`pearson correlation coefficient`)

Write file and view in excel

write.csv(c, "data/climate_processing/correlationBioclim.csv", row.names = FALSE)

Highly correlated layers (> |0.80|) can impact the statistical significance of the niche models and therefore must be removed.

Randomly select variables to remove

envtCor <- mixedsort(sort(findCorrelation(c, cutoff = 0.80, names = TRUE, exact = TRUE)))
envtCor
##  [1] "bio_1"  "bio_3"  "bio_4"  "bio_6"  "bio_10" "bio_11" "bio_12" "bio_13"
##  [9] "bio_16" "bio_17" "bio_19"

Variable inflation factor (VIF)

Warning: Time Intensive!
VIF can detect for multicollinearity in a set of multiple regression variables. Run a simple maxent model for every species and calculate the average permutation contribution.

Run preliminary MaxEnt models

Loop through each species and save permutation importance in list.
Warning: This will print warnings even when it works fine.

set.seed(195)
m <- c()
for(i in  1:length(unique(alldf$name))){
    species <- unique(alldf$name)[i]
    spp_df <-  alldf %>%
               dplyr::filter(name == species)
    coordinates(spp_df) <- ~ long + lat
    model <- maxent(x = clippedstack, p = coordinates(spp_df), 
                    progress = "text", silent = FALSE) 
    m[[i]] <- vimportance(model)
}

Bind the dataframes

mc <- do.call(rbind, m)

Calculate the mean and rename columns

mc_average <- aggregate(mc[, 2], list(mc$Variables), mean)
mc_average <- mc_average %>%
              dplyr::select(Variables = Group.1, permutation.importance = x)
mc1 <- mc_average

Select Layers

Use VIF and the MaxEnt permutation importance to select the best variables for your model. Note, this leads to different layers when the models are rerun without setting seed due to permutations being random.

selectedlayers <- VIF_layerselect(clippedstack, mc_average)
mixedsort(sort(names(selectedlayers)))
Correct set for workshop purposes

Since this can vary per system (despite setting seed), we added this line to keep our files consistent for the workshop

sl <- c("bio_3", "bio_7", "bio_8", "bio_9", "bio_14", "bio_15", "bio_18", "elev")
selectedlayers <- raster::subset(clippedstack, sl)

Copy selected layers

Move selected layers to “Present_Layers/all/” for use in MaxEnt later.

for(i in 1:length(names(selectedlayers))){
    name <- names(selectedlayers)[i]
    from <- paste0("data/climate_processing/all/", name, ".asc")
    to <- paste0("data/climate_processing/PresentLayers/all/", name, ".asc")
    file.copy(from, to,
              overwrite = TRUE, recursive = FALSE, 
              copy.mode = TRUE)
}

Create Species Training Layers

for(i in 1:length(unique(alldf$name))){
    species <- unique(alldf$name)[i]
    # Subset species from data frame
    spp_df <-  alldf %>%
               dplyr::filter(name == species)
    # Make spatial
    pts <- cbind(spp_df$long, spp_df$lat)
    spp_df <- SpatialPointsDataFrame(pts, data = spp_df, proj4string=CRS("EPSG:4326"))
    
    ## Create alpha hull
    sphull <- rangeBuilder::getDynamicAlphaHull(x = spp_df@coords, 
                                fraction = 1, # min. fraction of records we want included
                                partCount = 1, # number of polygons
                                initialAlpha = 20, # initial alpha size, 20m
                                clipToCoast = "terrestrial",
                                proj = "+proj=longlat +datum=WGS84")
    
    ### Transform into CRS related to meters
    sphullTrans <- spTransform(sphull[[1]], "+proj=cea +lat_ts=0 +lon_0=0")
    spp_dfTrans <- spTransform(spp_df, "+proj=cea +lat_ts=0 +lon_0")
    
    ### Calculate buffer size
    #### Here we take the 80th quantile of the max distance between points
    spbuffDist <- quantile(x = (apply(spDists(spp_dfTrans), 2, FUN = function(x) sort(x)[2])), 
                         probs = 0.80, na.rm = TRUE) 
    
    ### Buffer the hull
    spbuffer_m <- buffer(x = sphullTrans, width = spbuffDist, dissolve = TRUE)
    spbuffer <- spTransform(spbuffer_m,  CRS("EPSG:4326"))
    ### Crop and Mask
    spec <- gsub(" ", "_", species)
    path <- paste0("data/climate_processing/PresentLayers/", spec,"/")
    end <- ".asc"
    for(j in 1:length(names(selectedlayers))){
        # Subset raster layer
        rast <- selectedlayers[[j]]
        # Setup file names
        name <- names(rast)
        out <- paste0(path, name)
        outfile <- paste0(out, end)
        # Crop and mask
        c <- crop(rast, extent(spbuffer))
        c <- mask(c, spbuffer)
        # Write raster
        writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
    }
}

05-Point Based

Ecological analysis using points. This is based off scripts created by Hannah Owens and James Watling, and Anthony Melton.
Modified and created by ML Gaynor.

Load Packages

library(raster)
library(dplyr)
library(tidyr)
library(gtools)
library(factoextra)
library(FactoMineR)
library(multcompView)
library(ggsci)
library(gridExtra)
library(ggplot2)
library(ecospat)
library(dismo)
library(ade4)

Load functions

This function is from vqv/ggbiplot.

source("functions/ggbiplot_copy.R")

Load data file

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")

Raster layers

list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE) 
list <- mixedsort(sort(list))
envtStack <- stack(list)

ENM - Realized Niche

Extract value for each point

For each occurence record, extract the value for each bioclim variables using the package raster.

ptExtracted <- raster::extract(envtStack, alldf[3:2])

Convert to data frame

ptExtracteddf <- as.data.frame(ptExtracted)

Add species name

ptExtracteddf <- ptExtracteddf %>%
                 dplyr::mutate(name = as.character(alldf$name), x = alldf$long, y = alldf$lat)

Drop any NA

ptExtracteddf <- ptExtracteddf %>% 
                 tidyr::drop_na(bio_3, bio_7, bio_8, bio_9, bio_14, bio_15, bio_18, elev)

PCA

Create two dataframes.

data.bioclim <- ptExtracteddf[, 1:8]
data.species <- ptExtracteddf[, 9]

Using only the bioclim columns to run the principal components analysis.

data.pca <- prcomp(data.bioclim, scale. = TRUE) 

Understanding the PCA - Optional

When you use the command prcomp your loading variables show up as rotational variables. Thanks to a really great answer on stack overflow you can even convert the rotational variable to show the relative contribution.

loadings <- data.pca$rotation
summary(loadings)
##       PC1               PC2               PC3                PC4           
##  Min.   :-0.4412   Min.   :-0.2724   Min.   :-0.64672   Min.   :-0.424176  
##  1st Qu.:-0.3611   1st Qu.:-0.2104   1st Qu.:-0.39439   1st Qu.:-0.261111  
##  Median :-0.3191   Median : 0.2210   Median : 0.05544   Median :-0.002404  
##  Mean   :-0.1145   Mean   : 0.1396   Mean   :-0.06823   Mean   : 0.026802  
##  3rd Qu.: 0.1794   3rd Qu.: 0.3399   3rd Qu.: 0.22268   3rd Qu.: 0.195788  
##  Max.   : 0.4026   Max.   : 0.6699   Max.   : 0.35440   Max.   : 0.693426  
##       PC5               PC6                PC7               PC8          
##  Min.   :-0.3757   Min.   :-0.76021   Min.   :-0.6064   Min.   :-0.46111  
##  1st Qu.:-0.2111   1st Qu.:-0.38829   1st Qu.:-0.4301   1st Qu.:-0.01278  
##  Median : 0.3307   Median :-0.02424   Median :-0.1909   Median : 0.14040  
##  Mean   : 0.1403   Mean   :-0.14128   Mean   :-0.1633   Mean   : 0.14398  
##  3rd Qu.: 0.4042   3rd Qu.: 0.05884   3rd Qu.: 0.1460   3rd Qu.: 0.27908  
##  Max.   : 0.4495   Max.   : 0.32091   Max.   : 0.2800   Max.   : 0.62820

There are two options to convert the loading to show the relative contribution, they both give the same answer so either can be used.

Option 1
loadings_relative_A <- t(t(abs(loadings))/rowSums(t(abs(loadings))))*100
loadings_relative_A
##              PC1       PC2       PC3       PC4       PC5        PC6       PC7
## bio_3  12.351031  7.817062 15.694846 30.253573 13.782057  2.5529480 11.192623
## bio_7  14.392069 10.152713 18.714155  1.724893 15.488151  2.3322268 24.245310
## bio_8  14.793350 12.217596  8.970650  1.934623 10.321323 36.9229504  5.325418
## bio_9  11.101897  9.380471 26.974050 14.538194 13.689124 19.2400201  5.883375
## bio_14 16.211651  7.530844  1.611266 18.506444 14.478171  0.1983369  7.364618
## bio_15  3.991946 26.060031  3.013367  5.734652  7.455848 15.5864265  9.377266
## bio_18 12.958284 16.245219 10.239994 10.343363 16.378703  4.4353670 16.080716
## elev   14.199772 10.596062 14.781672 16.964258  8.406624 18.7317244 20.530674
##               PC8
## bio_3   0.1743179
## bio_7   7.2956146
## bio_8   5.6968636
## bio_9   1.8425332
## bio_14 29.0667559
## bio_15 26.0560908
## bio_18 21.3357900
## elev    8.5320340
Option 2
loadings_relative_B <- sweep(x = abs(loadings), 
                             MARGIN = 2, 
                             STATS = colSums(abs(loadings)), FUN = "/")*100
loadings_relative_B

Plotting the PCA

Set theme

First, I made a theme to change the background of the plot. Next, I changed the plot margins and the text size.

theme <- theme(panel.background = element_blank(),
               panel.border=element_rect(fill=NA),
               panel.grid.major = element_blank(),
               panel.grid.minor = element_blank(),
               strip.background=element_blank(),
               axis.ticks=element_line(colour="black"),
               plot.margin=unit(c(1,1,1,1),"line"), 
               axis.text = element_text(size = 12), 
               legend.text = element_text(size = 12), 
               legend.title = element_text(size = 12),
               text = element_text(size = 12))
Set color palette
pal <- pal_locuszoom()(4)
ggbiplot

Next, use ggbiplot where obs.scale indicates the scale factor to apply to observation, var.scale indicates the scale factor to apply to variables, ellipse as TRUE draws a normal data ellipse for each group, and circle as TRUE draws a correlation circle.

g <- ggbiplot(data.pca, obs.scale = 1, var.scale = 1, 
              groups = data.species, ellipse = TRUE, circle = TRUE) +
     scale_color_manual(name = '', values = pal) +
     theme(legend.direction = 'vertical', legend.position = 'right', 
           legend.text = element_text(size = 12, face = "italic")) +
     theme
 
g


ANOVA

Simple function to run an ANOVA and a post-hoc Tukey-HSD test

stat.test <- function(dataframe,  x = "name", y){
    bioaov <- aov(as.formula(paste0(y,"~",x)), data = dataframe) 
    TH <- TukeyHSD(bioaov, "name")
    m <- multcompLetters(TH$name[,4])
    y.val <-  as.numeric(max(dataframe[[y]]) + 1)
    groups <- data.frame(groups = m$Letters, name = names(m$Letters), y.val = rep(y.val, 4))
    return(groups)
}

Run for bio_3 only

bio3aovplot <- ggplot(ptExtracteddf, aes(x = name, y = bio_3)) +
               geom_boxplot(aes(fill = name)) +
               scale_color_manual(name = '', values = pal) +
               geom_text(data = stat.test(dataframe =ptExtracteddf,  y = "bio_3"), 
                        mapping = aes(x = name,
                                      y = max(ptExtracteddf["bio_3"]+1), 
                                      label = groups), 
                        size = 5, inherit.aes = FALSE) +
               theme(axis.text.x = element_text(angle = 90, size = 8, face = 'italic'))
bio3aovplot

Loop through all variables

variablelist <- colnames(ptExtracteddf)[1:8]
plotlist <- list()
maxstats <- c()
statsout <- c()

for(i in 1:8){
  vname <- variablelist[i]
  maxstats[i] <- as.numeric(max(ptExtracteddf[[vname]]) + 1)
  statsout[[i]] <- stat.test(dataframe = ptExtracteddf, y = vname)
  plotlist[[i]] <- ggplot(ptExtracteddf, aes(x = name, y = .data[[vname]])) +
                    geom_boxplot(aes(fill = name)) +
                    scale_colour_manual(name = 'Species', values = pal) +
                    geom_text(data = statsout[[i]], 
                              mapping = aes(x = name,
                                            y = y.val, 
                                            label = groups), 
                              size = 5, inherit.aes = FALSE) +
                    scale_x_discrete(labels = c('G', 'Pba','Pbr', 'S')) +
                    ggtitle(label = vname) +
                    ylab(vname) +
                    theme(legend.position = "none") +
                   ylim(0, (maxstats[i]*1.2))
}

gridExtra::grid.arrange(grobs = plotlist)
## Warning: Removed 15 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).


Ecospat Niche Overlap and Niche Equivalency

Set up background points

bg1 <- randomPoints(mask = envtStack, n = 1000, p = alldf[,3:2])
bg1.env <- raster::extract(envtStack, bg1)
bg1.env <- data.frame(bg1.env)
allpt.bioclim <- rbind(bg1.env, data.bioclim)  

dudi.PCA to reduce variables

pca.env <- dudi.pca(allpt.bioclim,
                    center = TRUE, # Center by the mean
                    scannf = FALSE, # Don't plot
                    nf = 2) # Number of axis to keep 

Pull out scores for each species

p1.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Galax urceolata")[, 1:8])$li
p2.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera barbulata")[, 1:8])$li
p3.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera brevifolia")[, 1:8])$li
p4.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Shortia galacifolia")[, 1:8])$li
scores.clim <- pca.env$li
Visualize
plot(scores.clim, pch = 16, asp = 1,
     col = adjustcolor(1, alpha.f = 0.2), cex = 2,
     xlab = "PC1", ylab = "PC2") 
points(p1.score, pch = 18, col = pal[1], cex = 2)
points(p2.score, pch = 18, col = pal[2], cex = 2)
points(p3.score, pch = 18, col = pal[3], cex = 2)
points(p4.score, pch = 18, col = pal[4], cex = 2)

Kernel density estimates

Create occurrence density grids based on the ordination data.

z1 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p1.score, R = 100)
## Registered S3 methods overwritten by 'adehabitatMA':
##   method                       from
##   print.SpatialPixelsDataFrame sp  
##   print.SpatialPixels          sp
z2 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p2.score, R = 100)
z3 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p3.score, R = 100)
z4 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p4.score, R = 100)
zlist  <- list(z1, z2, z3, z4)

Niche Overlap

Schoener’s D ranges from 0 to 1. 0 represents no similarity between niche space. 1 represents completely identical niche space.

overlapD <- matrix(ncol = 2, nrow = 7)
n <- 1
for(i in 1:3){
  for(j in 2:4){
    if(i != j){
      overlapD[n, 1]<- paste0("z", i, "-", "z", j)
      overlapD[n, 2]<- ecospat.niche.overlap(zlist[[i]], zlist[[j]], cor = TRUE)$D
      n <- n + 1
    }
  }
}

overlapDdf <- data.frame(overlapD)
overlapDdf
##      X1                  X2
## 1 z1-z2   0.085115840903214
## 2 z1-z3 0.00641344441404568
## 3 z1-z4   0.476780105514052
## 4 z2-z3  0.0115479048153222
## 5 z2-z4 0.00566124942172497
## 6 z3-z2  0.0115479048153222
## 7 z3-z4                   0
Niche Overlap Visualization
par(mfrow=c(1,2))
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 1
                       , title= "Niche Overlap - Z1 top", name.axis1="PC1", name.axis2="PC2")
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 2
                       , title= "Niche Overlap - Z4 top", name.axis1="PC1", name.axis2="PC2")

Niche Equivalency Test

Based on Warren et al. 2008 - Are the two niche identical?
Hypothesis test for D, null based on randomization. H1: the niche overlap is higher than expected by chance (or when randomized).

eq.test <- ecospat.niche.equivalency.test(z1, z4, rep = 10)
ecospat.plot.overlap.test(eq.test, "D", "Equivalency")

Niche Similarity Test

Based on Warren et al. 2008 - Are the two niche similar?
Can one species’ niche predict the occurrences of a second species better than expected by chance?

sim.test <- ecospat.niche.similarity.test(z1, z4, rep = 10, rand.type=2)
ecospat.plot.overlap.test(sim.test, "D", "Similarity")

06-Ecological Niche Modeling

Ecological Niche Modeling in R.
Modified and created by Anthony Melton and ML Gaynor.

This script is for generating and testing ENMs using ENMEval. Please see the paper describing ENMEval and their vignette.

Set up java memory

options(java.parameters = "- Xmx16g") # increase memory that can be used

Load Packages

library(raster)
library(gtools)
library(dplyr)
library(dismo)
library(ENMeval)
library(ggplot2)
library(viridis)

Load Function

source("functions/ENMevaluation.R")

Load data file

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")

Subset for each species

Galax_urceolata <- dplyr::filter(alldf, name == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, name == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, name == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, name == "Shortia galacifolia")

Raster layers

list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE) 
list <- mixedsort(sort(list))
allstack <- stack(list)

Read in species training layers

gstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Galax_urceolata/", full.names = TRUE))))
pbastack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_barbulata/", full.names = TRUE))))
pbrstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_brevifolia/", full.names = TRUE))))
sstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Shortia_galacifolia/", full.names = TRUE))))

Fix projection

projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(gstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbastack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbrstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(sstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"

Model Generation

dismo Model Generations

Warning: Time intensive on Apple M1 chip! Match MaxEnt GUI settings (see word document and powerpoint)

evaldis <- dismo::maxent(x = gstack, p = Galax_urceolata[, c("long", "lat")], nbg = 10000,
                         args = c("projectionlayers=data/climate_processing/PresentLayers/all",
                                  "responsecurves", "jackknife",  
                                  "outputformat=logistic","randomseed", 
                                  "randomtestpoints=25",  "replicates=5", 
                                  "replicatetype=subsample",  "maximumiterations=5000",
                                  "writebackgroundpredictions","responsecurvesexponent",
                                  "writeplotdata"), 
                         removeDuplicates = TRUE
                         #,path = "data/Ecological_Niche_Modeling/enm_output/Galax_urceolata/"
              )
## This is MaxEnt version 3.4.3

ENMeval Model generation

ENMeval will generate multiple models and test them per the specified test method.Two important variables here are the regularization multiplier (RM) value and the feature class (FC). FC will allow for different shapes in response curves (linear, hinge, quadratic, product, and threshold) can be used in the model and RM will influence how many parameters are included in the model.

eval1 <- ENMeval::ENMevaluate(occ = Galax_urceolata[, c("long", "lat")], 
                              env = gstack,
                              tune.args = list(fc = c("L","Q"), rm = 1:2), 
                              partitions = "block",
                              n.bg = 10000,
                              parallel = FALSE,
                              algorithm = 'maxent.jar', 
                              user.eval = proc)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## 
  |                                                                            
  |===================================                                   |  50%This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## 
  |                                                                            
  |====================================================                  |  75%This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## 
  |                                                                            
  |======================================================================| 100%This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3 
## This is MaxEnt version 3.4.3

Model Statistics

dismo

Inspect the dismo model based on the html

browseURL(evaldis@html)

ENMeval

Inspect the many models

Visualize

maps <- eval1@predictions
plot(maps)

Look at model overlap

mod_overlap <- calc.niche.overlap(eval1@predictions, overlapStat = "D")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
kable(mod_overlap) %>%
   kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
   scroll_box(width = "100%", height = "200px")
fc.L_rm.1 fc.Q_rm.1 fc.L_rm.2 fc.Q_rm.2
fc.L_rm.1 NA NA NA NA
fc.Q_rm.1 0.8931787 NA NA NA
fc.L_rm.2 0.9868645 0.8866661 NA NA
fc.Q_rm.2 0.8999516 0.9818840 0.8946668 NA

Inspect the results

Identify the best model selecting models with the lowest average test omission rate and the highest average validation AUC

results <- eval.results(eval1)
opt.seq <- results %>% 
            dplyr::filter(or.10p.avg == min(or.10p.avg)) %>% 
            dplyr::filter(auc.val.avg == max(auc.val.avg))
kable(opt.seq)  %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
   scroll_box(width = "100%", height = "200px")
fc rm tune.args auc.train cbi.train auc.diff.avg auc.diff.sd auc.val.avg auc.val.sd cbi.val.avg cbi.val.sd or.10p.avg or.10p.sd or.mtp.avg or.mtp.sd proc_auc_ratio.avg proc_auc_ratio.sd proc_pval.avg proc_pval.sd AICc delta.AICc w.AIC ncoef
Q 2 fc.Q_rm.2 0.8524554 0.998 0.0855877 0.0364185 0.8278837 0.0763053 0.8855 0.0469503 0.1284263 0.0727586 0.0041322 0.0047715 1.916903 0.0287362 0 0 11854.65 8.744809 0.0061021 7

Subset model

mod.seq <- eval.models(eval1)[[opt.seq$tune.args]]

Inspect

kable(mod.seq@results) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
  scroll_box(width = "100%", height = "200px")
X.Training.samples 482.0000
Regularized.training.gain 0.8771
Unregularized.training.gain 0.9226
Iterations 180.0000
Training.AUC 0.8366
X.Background.points 10474.0000
bio_14.contribution 13.1895
bio_15.contribution 6.2362
bio_18.contribution 3.2498
bio_3.contribution 14.0114
bio_7.contribution 4.0078
bio_8.contribution 13.7527
bio_9.contribution 7.5670
elev.contribution 37.9857
bio_14.permutation.importance 0.0000
bio_15.permutation.importance 37.3287
bio_18.permutation.importance 1.3948
bio_3.permutation.importance 11.8152
bio_7.permutation.importance 39.2240
bio_8.permutation.importance 1.6956
bio_9.permutation.importance 8.5417
elev.permutation.importance 0.0000
Entropy 8.3787
Prevalence..average.probability.of.presence.over.background.sites. 0.2383
Fixed.cumulative.value.1.cumulative.threshold 1.0000
Fixed.cumulative.value.1.Cloglog.threshold 0.0560
Fixed.cumulative.value.1.area 0.8887
Fixed.cumulative.value.1.training.omission 0.0124
Fixed.cumulative.value.5.cumulative.threshold 5.0000
Fixed.cumulative.value.5.Cloglog.threshold 0.1056
Fixed.cumulative.value.5.area 0.6944
Fixed.cumulative.value.5.training.omission 0.0560
Fixed.cumulative.value.10.cumulative.threshold 10.0000
Fixed.cumulative.value.10.Cloglog.threshold 0.1505
Fixed.cumulative.value.10.area 0.5399
Fixed.cumulative.value.10.training.omission 0.0975
Minimum.training.presence.cumulative.threshold 0.0166
Minimum.training.presence.Cloglog.threshold 0.0154
Minimum.training.presence.area 0.9931
Minimum.training.presence.training.omission 0.0000
X10.percentile.training.presence.cumulative.threshold 10.1387
X10.percentile.training.presence.Cloglog.threshold 0.1519
X10.percentile.training.presence.area 0.5364
X10.percentile.training.presence.training.omission 0.0996
Equal.training.sensitivity.and.specificity.cumulative.threshold 29.6595
Equal.training.sensitivity.and.specificity.Cloglog.threshold 0.3221
Equal.training.sensitivity.and.specificity.area 0.2220
Equal.training.sensitivity.and.specificity.training.omission 0.2220
Maximum.training.sensitivity.plus.specificity.cumulative.threshold 39.0791
Maximum.training.sensitivity.plus.specificity.Cloglog.threshold 0.4156
Maximum.training.sensitivity.plus.specificity.area 0.1349
Maximum.training.sensitivity.plus.specificity.training.omission 0.2946
Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold 1.8327
Balance.training.omission..predicted.area.and.threshold.value.Cloglog.threshold 0.0711
Balance.training.omission..predicted.area.and.threshold.value.area 0.8357
Balance.training.omission..predicted.area.and.threshold.value.training.omission 0.0145
Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold 15.8214
Equate.entropy.of.thresholded.and.original.distributions.Cloglog.threshold 0.2044
Equate.entropy.of.thresholded.and.original.distributions.area 0.4156
Equate.entropy.of.thresholded.and.original.distributions.training.omission 0.1452

Look at variable contribution

plot(mod.seq)

Look at the response curves

dismo::response(mod.seq)

Project model to allstack

p <- predict(mod.seq, allstack) 

Visualize

# Make into a data frame
p_df <-  as.data.frame(p, xy = TRUE)

# Plot
ggplot() +
  geom_raster(data = p_df, aes(x = x, y = y, fill = layer)) +
  geom_point(data= Galax_urceolata, 
             mapping = aes(x = long, y = lat), 
             col='red', cex=0.05) +
  coord_quickmap() +
  theme_bw() + 
  scale_fill_gradientn(colours = viridis::viridis(99),
                       na.value = "black")


Save outputs

R saved dataset

save(mod.seq, file = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.rda")

Save Raster

writeRaster(x = p, filename = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.asc",
            format = "ascii", NAFlag = "-9999", overwrite = T)

07-ENM Processing

Processing Ecological Niche Models in R.
Script by Anthony Melton and ML Gaynor.

Load Packages

library(raster)
library(gtools)
library(dplyr)
library(ENMTools)
library(ENMeval)
library(ape)
library(RStoolbox)
library(hypervolume)
library(phytools)

Load functions

The following command generates a binary predicted occurrence map. This was written by Anthony Melton.

source("functions/Functions_AEM.R")

Read in models you generated with the MaxEnt GUI into R.

sp1_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Galax_urceolata_avg.asc")
sp2_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_barbulata_avg.asc")
sp3_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_brevifolia_avg.asc")
sp4_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Shortia_galacifolia_avg.asc")

Read in Bioclim layers

These steps are described in the previous script

system("rm -rf data/climate_processing/PresentLayers/all/maxent.cache/")
list <- list.files("data/climate_processing/PresentLayers/all/", 
                   full.names = TRUE, recursive = FALSE) 
list <- mixedsort(sort(list))
allstack <- stack(list)
projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"

ENM Breadth

Niche breadth is the breadth of environmental factors for a species’ niche, ranging from 0 to 1. When breadth is closer to 1 the more generalist species with wider tolerances. Values closer to 0 indicate a more specialized species. The raster.breadth command in ENMTools measures the smoothness of suitability scores across a projected landscape. The higher the score, the more of the available niche space a species occupies.

sp1_breadth <- ENMTools::raster.breadth(x = sp1_enm.mx.b)
sp1_breadth$B2
## [1] 0.6369572
sp2_breadth <- ENMTools::raster.breadth(x = sp2_enm.mx.b)
sp2_breadth$B2
## [1] 0.1665568
sp3_breadth <- ENMTools::raster.breadth(x = sp3_enm.mx.b)
sp3_breadth$B2
## [1] 0.04380163
sp4_breadth <- ENMTools::raster.breadth(x = sp4_enm.mx.b)
sp4_breadth$B2
## [1] 0.4567

ENM Overlap

Calculating niche overlap, Schoener’s D, with ENMEval - Schoener’s D ranges from 0 to 1, where 0 represents no similarity between the projections and 1 represents completely identical projections.

Stack the projections and make sure the stack is named.

enm_stack.b <- stack(sp1_enm.mx.b, sp2_enm.mx.b, sp3_enm.mx.b, sp4_enm.mx.b)
names(enm_stack.b) <- c("Galax urceolata", "Pyxidanthera barbulata",  "Pyxidanthera brevifolia","Shortia galacifolia" )

Calculate

calc.niche.overlap(enm_stack.b, overlapStat = "D")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
##                         Galax.urceolata Pyxidanthera.barbulata
## Galax.urceolata                      NA                     NA
## Pyxidanthera.barbulata        0.2108513                     NA
## Pyxidanthera.brevifolia       0.1287981              0.1614858
## Shortia.galacifolia           0.6447093              0.1228339
##                         Pyxidanthera.brevifolia Shortia.galacifolia
## Galax.urceolata                              NA                  NA
## Pyxidanthera.barbulata                       NA                  NA
## Pyxidanthera.brevifolia                      NA                  NA
## Shortia.galacifolia                  0.03923816                  NA

Phylogenetic correlations

Let’s look at niche overlap over a tree!

Load the tree file

tree <- ape::read.tree(file = 
                         "data/Ecological_Niche_Modeling/AOC_Test_Demo/diapensiaceae_subset.tre")
tree <- ape::compute.brlen(phy = tree, method = "grafen")
plot(tree)

Drop the outgroup

tree <- drop.tip(tree, "Cyrilla_racemiflora")
plot(tree)

Load data file

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20220712.csv")

Subset for each species

Galax_urceolata <- dplyr::filter(alldf, name == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, name == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, name == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, name == "Shortia galacifolia")

Generate species objects for each tree member!

sp1 <- enmtools.species(species.name = "Galax_urceolata",
                        presence.points = Galax_urceolata[,3:2])
sp1$range <- background.raster.buffer(sp1$presence.points, 25000, mask = allstack)
sp1$background.points = background.points.buffer(points = sp1$presence.points, radius = 5000, n = 10000, mask =  allstack[[1]])
##############################
sp2 <- enmtools.species(species.name = "Pyxidanthera_barbulata",
                                        presence.points = Pyxidanthera_barbulata[,3:2])
sp2$range <- background.raster.buffer(sp2$presence.points, 25000, mask = allstack)
sp2$background.points = background.points.buffer(points = sp2$presence.points, radius = 5000, n = 10000, mask =  allstack[[1]])
#############################
sp3 <- enmtools.species(species.name = "Pyxidanthera_brevifolia",
                                        presence.points = Pyxidanthera_brevifolia[,3:2])
sp3$range <- background.raster.buffer(sp3$presence.points, 25000, mask = allstack)
sp3$background.points = background.points.buffer(points = sp3$presence.points, radius = 5000, n = 10000, mask =  allstack[[1]])
#############################
sp4 <- enmtools.species(species.name = "Shortia_galacifolia",
                                          presence.points = Shortia_galacifolia[,3:2])
sp4$range <- background.raster.buffer(sp4$presence.points, 25000, mask = allstack)
sp4$background.points = background.points.buffer(points = sp4$presence.points, radius = 5000, n = 10000, mask =  allstack[[1]])

Create “clade” object with all the species in the tree

clade=enmtools.clade(species = list(sp1, sp2, sp3, sp4), tree = tree)
check.clade(clade)
## 
## 
## An enmtools.clade object with 4 species
## 
## Species names: 
##   Shortia_galacifolia     Pyxidanthera_brevifolia     Pyxidanthera_barbulata      Galax_urceolata
## 
## Tree: 
## 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
##   Shortia_galacifolia, Pyxidanthera_brevifolia, Pyxidanthera_barbulata, Galax_urceolata
## 
## Rooted; includes branch lengths.
## 
## 
## Data Summary: 
## <table>
##  <thead>
##   <tr>
##    <th style="text-align:left;">   </th>
##    <th style="text-align:left;"> species.names </th>
##    <th style="text-align:left;"> in.tree </th>
##    <th style="text-align:left;"> presence </th>
##    <th style="text-align:left;"> background </th>
##    <th style="text-align:left;"> range </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> Shortia_galacifolia </td>
##    <td style="text-align:left;"> Shortia_galacifolia </td>
##    <td style="text-align:left;"> TRUE </td>
##    <td style="text-align:left;"> 205 </td>
##    <td style="text-align:left;"> 10000 </td>
##    <td style="text-align:left;"> present </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Pyxidanthera_brevifolia </td>
##    <td style="text-align:left;"> Pyxidanthera_brevifolia </td>
##    <td style="text-align:left;"> TRUE </td>
##    <td style="text-align:left;"> 47 </td>
##    <td style="text-align:left;"> 10000 </td>
##    <td style="text-align:left;"> present </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Pyxidanthera_barbulata </td>
##    <td style="text-align:left;"> Pyxidanthera_barbulata </td>
##    <td style="text-align:left;"> TRUE </td>
##    <td style="text-align:left;"> 332 </td>
##    <td style="text-align:left;"> 10000 </td>
##    <td style="text-align:left;"> present </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Galax_urceolata </td>
##    <td style="text-align:left;"> Galax_urceolata </td>
##    <td style="text-align:left;"> TRUE </td>
##    <td style="text-align:left;"> 482 </td>
##    <td style="text-align:left;"> 10000 </td>
##    <td style="text-align:left;"> present </td>
##   </tr>
## </tbody>
## </table>

Age-Range Correlation Test

range.aoc <- enmtools.aoc(clade = clade,  nreps = 10, overlap.source = "range")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
summary(range.aoc)
## 
## 
## Age-Overlap Correlation test
## 
## 10 replicates 
## 
## p values:
##      (Intercept) empirical.df$age 
##        0.1363636        0.1363636

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## NULL

Age-Overlap Correlation Test

glm.aoc <- enmtools.aoc(clade = clade,  env = allstack, nreps = 10, overlap.source = "glm")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
summary(glm.aoc)
## 
## 
## Age-Overlap Correlation test
## 
## 10 replicates 
## 
## p values:
##      (Intercept) empirical.df$age 
##        0.2727273        0.2727273

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## NULL

The results here are pretty meaningless since we’re looking at very few species, but it serves the purpose of the demo. For range AOCs, an intercept >0.5 and negative slope are indicative of sympatric species, while an intercept of <0.5 and a positive slope are indicative on non-sympatric speciation. A low intercept and positive slope for niche overlap would indicate niche divergence. ***

Hypervolume

This will generate a binary map with any cell that contains a suitability score greater than or equal to the lowest score with an occurrence point as a presence. There are other ways to set the threshold, including percentiles or training statistics.

Create the binary maps

sp1.dist <- make.binary.map(model = sp1_enm.mx.b, occ.dat = Galax_urceolata)
sp2.dist <- make.binary.map(model = sp2_enm.mx.b, occ.dat = Pyxidanthera_barbulata)
sp3.dist <- make.binary.map(model = sp3_enm.mx.b, occ.dat = Pyxidanthera_brevifolia)
sp4.dist <- make.binary.map(model = sp4_enm.mx.b, occ.dat = Shortia_galacifolia)

Plot

par(mfrow = c(2,2),  mar = c(1,1,1,1))
plot(sp1.dist)
plot(sp2.dist)
plot(sp3.dist)
plot(sp4.dist)

Hypervolume

Next, let’s work on getting some data from the predicted distributions! Niche space can be thought of as a multi-dimensional hypervolume. We’re using climatic data in this case, so we’re measuring the hypervolume of climatic niche space occupied by these species.
Warning: This takes forever!

sp1.hv <- get_hypervolume(binary_projection = sp1.dist, envt = allstack)
sp2.hv <- get_hypervolume(binary_projection = sp2.dist, envt = allstack)
sp3.hv <- get_hypervolume(binary_projection = sp3.dist, envt = allstack)
sp4.hv <- get_hypervolume(binary_projection = sp4.dist, envt = allstack)

Compare the hypervolumes

hv_set <- hypervolume_set(hv1 = sp1.hv, hv2 = sp2.hv, check.memory = F)
hypervolume_overlap_statistics(hv_set)
plot(hv_set)
get_volume(hv_set)